Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Template's mappings as bytes for disk serialization #78746

Merged
merged 23 commits into from
Oct 19, 2021

Conversation

probakowski
Copy link
Contributor

This change the way we store mappings in Template during serialization to disk - instead storing it as map we use byte array that we already have. This avoids deserialization-serialization cycle during storing cluster state on disk.

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this more like we do for mappings (maybe even reuse some code from there) to keep things simpler. No need to do any hacks around reading base64 from JSON.

if (uncompressedMapping.size() > 0) {
builder.field(MAPPINGS.getPreferredName());
builder.map(reduceMapping(uncompressedMapping));
if (Metadata.CONTEXT_MODE_GATEWAY.equals(params.param(Metadata.CONTEXT_MODE_PARAM, Metadata.CONTEXT_MODE_API))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this followed the exact same conventions we use for mappings, respecting the binary parameter and serializing as bytes for everything but the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added binary parameter and changed usage to similar to IndexMetada

Object compressed = values.get("compressed");
if (compressed == null) {
return new CompressedXContent(Strings.toString(XContentFactory.jsonBuilder().map(values)));
} else if (compressed instanceof String) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this if we do things the same way we do them for mappings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles situation when you use binary parameter with JsonXContent - it stores byte arrays as BASE64 encoded strings. Without it ToAndFromJsonMetadataTests#testSimpleJsonFromAndTo fails

@@ -88,6 +89,15 @@ private static String differenceBetweenObjectsIgnoringArrayOrder(String path, Ob
} else {
return path + ": first element is null, the second element is not null";
}
} else if (first instanceof byte[]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also not necessary to extend things this way if we just do what we do for mappings and have a special binary param path here I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for ESIntegTestCase#ensureClusterStateCanBeReadByNodeTool to pass - this test uses binary serialization and this is the first instance where we have to compare byte[] in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we have that for mappings already, they should be binary serialized as well and appear to run through the exact same code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was different path with parsing IndexMetadata->bytes->IndexMetada and ComposableIndexMetadata->bytes->UnknownMetadataCustom when using ElasticsearchNodeCommand#namedXContentRegistry so results differed when using binary serialization. I changed ElasticsearchNodeCommand to parse ComposableIndexMetadata so the code is no longer needed.

@probakowski probakowski changed the title Store mappings as bytes for disk serialization Store Template's mappings as bytes for disk serialization Oct 11, 2021
@probakowski
Copy link
Contributor Author

@elasticmachine update branch

@probakowski
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-2

@probakowski
Copy link
Contributor Author

@original-brownbear I simplified code and made it similar to IndexMetadata, would you mind taking another look?

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure why we have to do anything different than we do with mappings here in the tests and in part of production code. Can you explain what's different here?

new CompressedXContent(Strings.toString(XContentFactory.jsonBuilder().map(p.mapOrdered()))), MAPPINGS);
PARSER.declareField(ConstructingObjectParser.optionalConstructorArg(), (p, c) -> {
XContentParser.Token token = p.currentToken();
if (token == XContentParser.Token.VALUE_STRING) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we encode the template like this? This seems broken?
We don't seem to run into a base64 JSON string anywhere for mappings so we shouldn't for templates?

Copy link
Contributor Author

@probakowski probakowski Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shows up with combination of JsonXContent and binary parameter. I think we have that only in ToAndFromJsonMetadataTests#testSimpleJsonFromAndTo. Thing is we don't serialize IndexMetadata there because Metadata.toXContent will filter them out if context is not XContentContext.API so we don't have to serialize mappings for them as well. Test like below would fail exactly because we don't handle VALUE_STRING case in IndexMetadata, we just don't test it. It would work with SmileXContent.

IndexMetadata.Builder indexMetadataBuilder = IndexMetadata.builder("test12")
    .settings(settings(Version.CURRENT))
    .numberOfShards(1)
    .numberOfReplicas(0)
    .putMapping("{\"mapping1\":{\"text1\":{\"type\":\"string\"}}}");

Map<String, String> params = new HashMap<>(2);
params.put("binary", "true");
params.put(Metadata.CONTEXT_MODE_PARAM, Metadata.CONTEXT_MODE_GATEWAY);

XContentBuilder builder = JsonXContent.contentBuilder();
builder.startObject();
indexMetadataBuilder.build().toXContent(builder, new ToXContent.MapParams(params));
builder.endObject();

IndexMetadata indexMetadata = IndexMetadata.fromXContent(createParser(builder));
assertNotNull(indexMetadata.mapping());

@@ -88,6 +89,15 @@ private static String differenceBetweenObjectsIgnoringArrayOrder(String path, Ob
} else {
return path + ": first element is null, the second element is not null";
}
} else if (first instanceof byte[]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we have that for mappings already, they should be binary serialized as well and appear to run through the exact same code?

@probakowski probakowski self-assigned this Oct 14, 2021
@probakowski
Copy link
Contributor Author

@elasticmachine update branch

@probakowski
Copy link
Contributor Author

@elasticmachine update branch

@probakowski
Copy link
Contributor Author

@elasticmachine update branch

@probakowski
Copy link
Contributor Author

@elasticmachine update branch

@probakowski
Copy link
Contributor Author

@elasticmachine update branch

@probakowski probakowski added auto-backport-and-merge auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Oct 19, 2021
@elasticsearchmachine elasticsearchmachine merged commit 675c1f4 into elastic:master Oct 19, 2021
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 78746

probakowski added a commit to probakowski/elasticsearch that referenced this pull request Oct 19, 2021
)

This change the way we store mappings in `Template` during serialization
to disk - instead storing it as map we use byte array that we already
have. This avoids deserialization-serialization cycle during storing
cluster state on disk.
# Conflicts:
#	server/src/main/java/org/elasticsearch/cluster/coordination/ElasticsearchNodeCommand.java
@probakowski probakowski deleted the template-fix branch October 19, 2021 22:28
elasticsearchmachine pushed a commit that referenced this pull request Oct 19, 2021
…79522)

This change the way we store mappings in `Template` during serialization
to disk - instead storing it as map we use byte array that we already
have. This avoids deserialization-serialization cycle during storing
cluster state on disk.
# Conflicts:
#	server/src/main/java/org/elasticsearch/cluster/coordination/ElasticsearchNodeCommand.java
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Oct 20, 2021
* upstream/master: (24 commits)
  Implement framework for migrating system indices (elastic#78951)
  Improve transient settings deprecation message (elastic#79504)
  Remove getValue and getValues from Field (elastic#79516)
  Store Template's mappings as bytes for disk serialization (elastic#78746)
  [ML] Add queue_capacity setting to start deployment API (elastic#79433)
  [ML] muting rest compat test issue elastic#79518 (elastic#79519)
  Avoid redundant available indices check (elastic#76540)
  Re-enable BWC tests
  TEST Ensure password 14 chars length on Kerberos FIPS tests (elastic#79496)
  [DOCS] Temporarily remove APM links (elastic#79411)
  Fix CCSDuelIT for skipped shards (elastic#79490)
  Add other time accounting in HotThreads (elastic#79392)
  Add deprecation info API entries for deprecated monitoring settings (elastic#78799)
  Add note in breaking changes for nameid_format (elastic#77785)
  Use 'migration' instead of 'upgrade' in GET system feature migration status responses (elastic#79302)
  Upgrade lucene version 8b68bf60c98 (elastic#79461)
  Use Strings#EMPTY_ARRAY (elastic#79452)
  Quicker shared cache file preallocation (elastic#79447)
  [ML] Removing some code that's obsolete for 8.0 (elastic#79444)
  Ensure indexing_data CCR requests are compressed (elastic#79413)
  ...
@danhermann danhermann added the :Data Management/Indices APIs APIs to create and manage indices and templates label Dec 3, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Dec 3, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Data Management/Indices APIs APIs to create and manage indices and templates >enhancement Team:Data Management Meta label for data/management team v7.16.0 v8.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants